Comparing Weighting Models for Monolingual Information Retrieval
نویسندگان
چکیده
Motivated by the hypothesis that the retrieval performance of a weighting model is independent of the language in which queries and collection are expressed, we compared the retrieval performance of three weighting models, i.e., Okapi, statistical language modeling (SLM), and deviation from randomness (DFR), on three monolingual test collections, i.e., French, Italian, and Spanish. The DFR model was found to consistently achieve better results than both Okapi and SLM, whose performance was comparable. We also evaluated whether the use of retrieval feedback improved retrieval performance; retrieval feedback was beneficial for DFR and Okapi and detrimental for SLM. Besides relative performance, DFR with retrieval feedback achieved excellent absolute results: best run for Italian and Spanish, third run for French.
منابع مشابه
EXETER at CLEF 2002: Experiments with Machine Translation for Monolingual and Bilingual Retrieval
This year, the University of Exeter participated in both the CLEF 2002 monolingual and bilingual task for two languages: Italian and Spanish. We submitted 4 ranked results each for both Italian and Spanish Monolingual tasks and 5 each for the bilingual tasks. We report experimental results from our investigations of merging topic translations from two machine translation (MT) systems and recent...
متن کاملITC-irst at CLEF 2000: Italian Monolingual Track
This paper presents work on document retrieval for Italian carried out at ITC-irst. Two different approaches to information retrieval were investigated, one based on the Okapi weighting formula and one based on a statistical model. Development experiments were carried out using the Italian sample of the TREC-8 CLIR track. Performance evaluation was done on the Cross Language Evaluation Forum (C...
متن کاملItalian Text Retrieval for CLEF 2000 at ITC-irst
This paper presents work on document retrieval for Italian carried out at ITC-irst. Two different approaches to information retrieval were investigated, one based on the Okapi weighting formula and one based on a statistical model. Development experiments were carried out using the Italian sample of the TREC-8 CLIR track. Performance evaluation was done on the Cross Language Evaluation Forum (C...
متن کاملTranslation Term Weighting and Combining Translation Resources in Cross-Language Retrieval
In TREC-10 the Berkeley group participated only in the English-Arabic cross-language retrieval (CLIR) track. One Arabic monolingual run and four English-Arabic cross-language runs were submitted. Our approach to the cross-language retrieval was to translate the English topics into Arabic using online EnglishArabic bilingual dictionaries and machine translation software. The five official runs a...
متن کاملUniversity of Hagen at GeoCLEF 2008: Combining IR and QA for Geographic Information Retrieval
This paper describes the participation of GIRSA at GeoCLEF 2008, the geographic information retrieval task at CLEF. GIRSA is a modified and improved variant of the system which participated at GeoCLEF 2007. It combines results retrieved with methods from information retrieval (IR) on geographically annotated data and question answering (QA) employing query decomposition. For the monolingual Ger...
متن کامل